AITopics | learning stochastic

Collaborating Authors

learning stochastic

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Simultaneously Learning Stochastic and Adversarial Episodic MDPs with Known Transition

Neural Information Processing SystemsDec-24-2025, 13:26:40 GMT

This work studies the problem of learning episodic Markov Decision Processes with known transition and bandit feedback. We develop the first algorithm with a ``best-of-both-worlds'' guarantee: it achieves O(log T) regret when the losses are stochastic, and simultaneously enjoys worst-case robustness with \tilde{O}(\sqrt{T}) regret even when the losses are adversarial, where T is the number of episodes. More generally, it achieves \tilde{O}(\sqrt{C}) regret in an intermediate setting where the losses are corrupted by a total amount of C. Our algorithm is based on the Follow-the-Regularized-Leader method from Zimin and Neu (2013), with a novel hybrid regularizer inspired by recent works of Zimmert et al. (2019a, 2019b) for the special case of multi-armed bandits. Crucially, our regularizer admits a non-diagonal Hessian with a highly complicated inverse. Analyzing such a regularizer and deriving a particular self-bounding regret guarantee is our key technical contribution and might be of independent interest.

learning stochastic, name change, stochastic and adversarial episodic mdp, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.41)

Add feedback

Review for NeurIPS paper: Simultaneously Learning Stochastic and Adversarial Episodic MDPs with Known Transition

Neural Information Processing SystemsMay-31-2025, 17:08:22 GMT

Weaknesses: Soundness of the claims: While the general ideas seem somewhat clear, most of the proofs read more like sketches and some of the claims need to be verified by hand. For example in the proof of Lemma 9 the authors do not provide a derivation for the each of the elements of the Hessian but merely state that the result follows from a direct computation (which seems to be left as an exercise to the reader). Other examples where more details would not hurt are the proof of Lemma 13, where the authors bound \ q_t - \tilde \q_t\ in a self-bounding way, to achieve the upper bound but no mention of this is given other than the final result and the proof of Lemma 27, where derivations on lines 720-722 were not entirely straightforward. Overall the Appendix can benefit from more details in the proofs. Significance and novelty of contribution: The core ideas of this work are not novel.

algorithm, learning stochastic, stochastic and adversarial episodic mdp, (10 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.40)

Add feedback

Review for NeurIPS paper: Simultaneously Learning Stochastic and Adversarial Episodic MDPs with Known Transition

Neural Information Processing SystemsMay-31-2025, 17:08:13 GMT

This paper was well-received by the reviewers and the author response was effective in addressing the concerns raised in the initial reviews. As a result, several reviewers updated their scores. The paper is clearly suitable for publication without significant changes.

learning stochastic, neurips paper, stochastic and adversarial episodic mdp, (1 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.40)

Add feedback

Simultaneously Learning Stochastic and Adversarial Episodic MDPs with Known Transition

Neural Information Processing SystemsOct-11-2024, 06:22:03 GMT

This work studies the problem of learning episodic Markov Decision Processes with known transition and bandit feedback. We develop the first algorithm with a best-of-both-worlds'' guarantee: it achieves O(log T) regret when the losses are stochastic, and simultaneously enjoys worst-case robustness with \tilde{O}(\sqrt{T}) regret even when the losses are adversarial, where T is the number of episodes. More generally, it achieves \tilde{O}(\sqrt{C}) regret in an intermediate setting where the losses are corrupted by a total amount of C. Our algorithm is based on the Follow-the-Regularized-Leader method from Zimin and Neu (2013), with a novel hybrid regularizer inspired by recent works of Zimmert et al. (2019a, 2019b) for the special case of multi-armed bandits. Crucially, our regularizer admits a non-diagonal Hessian with a highly complicated inverse. Analyzing such a regularizer and deriving a particular self-bounding regret guarantee is our key technical contribution and might be of independent interest.

learning stochastic, stochastic and adversarial episodic mdp, transition, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.75)

Add feedback